34 research outputs found

    Finding regulatory DNA motifs using alignment-free evolutionary conservation information

    Get PDF
    As an increasing number of eukaryotic genomes are being sequenced, comparative studies aimed at detecting regulatory elements in intergenic sequences are becoming more prevalent. Most comparative methods for transcription factor (TF) binding site discovery make use of global or local alignments of orthologous regulatory regions to assess whether a particular DNA site is conserved across related organisms, and thus more likely to be functional. Since binding sites are usually short, sometimes degenerate, and often independent of orientation, alignment algorithms may not align them correctly. Here, we present a novel, alignment-free approach for using conservation information for TF binding site discovery. We relax the definition of conserved sites: we consider a DNA site within a regulatory region to be conserved in an orthologous sequence if it occurs anywhere in that sequence, irrespective of orientation. We use this definition to derive informative priors over DNA sequence positions, and incorporate these priors into a Gibbs sampling algorithm for motif discovery. Our approach is simple and fast. It requires neither sequence alignments nor the phylogenetic relationships between the orthologous sequences, yet it is more effective on real biological data than methods that do

    A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast

    Get PDF
    Finding functional DNA binding sites of transcription factors (TFs) throughout the genome is a crucial step in understanding transcriptional regulation. Unfortunately, these binding sites are typically short and degenerate, posing a significant statistical challenge: many more matches to known TF motifs occur in the genome than are actually functional. However, information about chromatin structure may help to identify the functional sites. In particular, it has been shown that active regulatory regions are usually depleted of nucleosomes, thereby enabling TFs to bind DNA in those regions. Here, we describe a novel motif discovery algorithm that employs an informative prior over DNA sequence positions based on a discriminative view of nucleosome occupancy. When a Gibbs sampling algorithm is applied to yeast sequence-sets identified by ChIP-chip, the correct motif is found in 52% more cases with our informative prior than with the commonly used uniform prior. This is the first demonstration that nucleosome occupancy information can be used to improve motif discovery. The improvement is dramatic, even though we are using only a statistical model to predict nucleosome occupancy; we expect our results to improve further as high-resolution genome-wide experimental nucleosome occupancy data becomes increasingly available
    corecore